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To jexpj.ain the discrepancy between median scores oh 
the 1976 administration of the California Achievement Tests (CAT) and 
the Sequential Tests of Educatibnal Progress (STEP) in the Austin 
Independent School District (AISD) , ten technical variables typical 
of achievement tests vere considered as explanations. (1) The 3TEP. 
maymeasure different skills than the CAT. (2) Norm groups<4may 
differ. (3) The STEP may not m^asure;what the high sdhpols are ^ 
teaching, (i|) Curriculum^ sequencing of AISD high schools may not - 
conform to that of *the norm group., (5) Cross- leveP curriculum 
planning betvelen elementary, junior high, and senior high levels may 
not be coordinated. (6) 'The AISD population may differ from the 
national population an,d. hence from the norms, (7) Test familiarity 
may play a role in scfere differences, (8) The STEP is a more 
difficult test than the CAT. (9) The time of year of the 
administration may have depressed i^TEP scores. (10) Administration 
procedures, differed from- those used in the norming study. Of the ten 
variables conlSidered, all but number 3 and possibly number 4 vere 
accepted as possible explanations for the score^ differences. A 
further comparison was made between the CAT, STEP# and the CTB/McGraw 
Hill Proficiency and Review Tests for Reading and Numerical' 
Proficiency. (CP) 
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C Comparing Scores on the 

California Achievement Teat (C^T) 

to Scores on the . 
*A Sequential T^st of Educational Progress (STEP) 



Act^evem'ent tests are not stable measuring instruments liHe meter 
sticks and thermometers.' On the contrary they ar^ what we might call^ 
^'approximation instruments'* because th6y measure the very difficult 'to 
capture /cbrtstruct called "human knowledge." Achievement test scores 
depend on two factors. The. first is the humah knowledge factor itself. 
If you have ever had the experience of being unable to remember tM name \ ^ 
of a person or thing you know quite well, the measurement problem lassociated 
with this factor will be easy to understand. One's knowledge is d/ependent 
on* such things as one's emotional state, the setting In which the knowledge 
must be used or recalled, and even the time of day. In addition t6 thi» " 
personal variability factor, however, achievement test scores are also subject 
to the technical variability through'which the tests and the scores ar^ 
derived, A meter measure, for example, can always be referred<^to one wocJM 
standard maintained in France. Achievement tests kave no such common ire^err^nt 
they are dependent rather on a number of varying elements. Among tjhese \re: 
differing contents, differing, norm group compositions , and differirig levels . 
of dl^fflculty of the items^of the test. These factS; about achievement 
tests have to be taken into account when we look at disparate AISD median 
scpres on different achievement testa. As the Office pf Research and, 
Evaluation (ORE) has considered the differences found thisi year on th6 two 
primaty achievement tests \re use wliich are referenced in th« title, the 
following explanations, have been ^considered, ; 



1. The ^TEP may measure different thirigs than the.OAT. ^. . v 

' ■ , " ■ ■ ' ' ■ •■^^r '''V . ' V "'' v '■ 

ORE finds this to be true. There is good evidence (see attachment 1) 
that the CAT Is weighted toward the measurement of.what We might call ' 
minimal l>asic skills while the STEP is measuring higher level acgi'demic 
competencies^ Moreover, it may bfc that the possession of the minimal 
basic skill is a. necessary, but notvsufficient preparation for [those 
higher level aoademip skills, 'Thus, a high score on the CAT would be 
necessary to achieve a high score on the STEP, but just- because one 
had a high score on the CAT, he would not be guaranteed a high score on 
the STEP unless he also had much additional competency over and above 
that measured by the CAT. * ' I 

1 ^ :•'■,•.-•..!•-• ■ 



The STEP norm group may be different ^btn the CAT^ norm group'. 

ORE feeils this also may ^e true. Ifational norms are presumed to be ' 
representative of" the national school ppjpul^t ion make-up. However, 
different companies define their own norm groups and there is no 
national standard for this. Thus, for example, one company may include 
private schools ^in their population, another may not. Also, test 
companies cannot force schools or students to participate in their 
norming group, and economics prevent their giving much economic reward ; 
for doing so. Theirefore, norm groups rarely confdrm to precise sampling 
requirements necessary for true population representativeness.' School 
systems thus suffer from the lack of a true national achievemetit standard. 



1 



there ii some^vidence that the? STEP norm group and , the CAT norm group 
are discrepant, based on evidence from the Anchor Test Study' (a national 
study spon$ored by the Office of Education that seeks to equate tests at 
certain grfljde levels.) Por example ,; if we' compare one level of- the STEP 
(a lower gitade level than the one AISp uses is the only one included 'in the 
Aixchpr Test! Study) Reading test to the CAT eading Total, we consistently^ 
find a 4 to 5 percentile difference (see chart below). It is reasonable 
to expect that the sai^e kind of difference ,will be found at the higher 
levels of the tests. ^ 



Predicted STEP Percentiles for the 50th Percentile of the CAT^ " 



Grade 



V 



6 6 



1 
St 



CAT Reading Total, ^ STEP Reading - 

(%iles assuming March-June testing) (%iles 'assuming April ^stirig) 



SOZile 
(level 3) 



50%ile 
(level 3) 

'5!' (level 4) 



Raw Score of 4 2 — ^Raw Score of 31 . - 46%ile 

, (level 4) 

"i " . ■. , ' . ■ ... ■ ■ • . 

■t , ■ . ^ ' 

Raw Score of 53 ^aw Score of 37 - 44%ile 

' ^ V (l^vfrl 4) 

Raw Score of 4 1.^^"^'^' ''>R aw Score of 42 47Xile 



(level 4) ^ . ■ 

"Xonverted- Scale Score 
of 435= ' 

//Raw Scofe of 27 . 
(level 3) 
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Based on Table I, page 9 ; ' Tabl&^ 8 , and Table 15 of the Anchor Test 
tudy Usar's-lfaaual - Equivalency and Norms Jables . Berkeley, Call 



Educational Testiilg Servicg, 1973. / 



leVkeley, Calif ofmia: 



One factor t the period of nbrmlng, would not appear to account for 
^ a discrepancy. Both tests were normed at approximately the same time, 

1970/ Both norms, Incidentally, might now be conlsildered out-of-date, 
. and In view, of natlonarjevldence of lower achievement this may result Ip 
. Austin looking less^ welT than It would were comparisons based on current 
national achievement score levels. - 

3*. The STEP may not measure what the high schools are teaching. 

ORE cannot -accept this hypothesis for two reasons. First, "the STEP j 
was one of three t^st batter-les^elected by school and central offioe - ^ 
staff as being adi||ptable foir, AISD higl|i school curriculum (see Figure B- 
ip the Systernwid^Kvaluation Technical Report 1975>"76) . In addition, 
the upward movement of scores from 9 to 12th grade indicates in tjhe 
graph below suggests a situation in which students are increasingly 
matching up to "a currltCulum. 
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10 11 . 

GRADE LEVELS 



Tjhe, curriculum sequencing of AISD high schools may not match up to 
the norm group schools' sequencing. 

m • 

ORE cannot adequately evaldate this hypothesis since the norm group* 
school curriculum sequencing -is unknovm. However, one would expect 
natiotial sampling to adjust for such difference. iCn example may^ 
serve to clarify fhis hypothesis. Say chemistry were nationally 
taught in the 9th grade' and' in Austin in the 12th grade. Jhis would 
mean that AISD students would, miss thosri chemistiry items on the test 
uh^til they reached the 12th grade. Some coordinators have expressed 
the feeling that this majr be a factor and higher 12thc grade scores may 
•tend to, confirm this as a possibility. However, one would ekpect 
that total scores would compertsate for such a factor since all students 
receive the same tek^ in, all grades 9-12. That /Ls , the student yho ,# 
had' not yet had chemistry might get items say in Physics and thus 
compensate for the difference In sequencing. 

The elementary., junior^ high, and senior high school curricula may not ^ 
match up in AISD. , . " ' 

ORE feels that this also may be^true. To some extent, of course, a^ 
perfec^t match would not be expected. Students begin to elect * 
d^ifferent scholastic pathways at the senio^* high l^vel as they jf 
begin to prepare for future careers. However, the discrepancy between 
9th and 12th grade scores and teacher comments. about student preparation 
as the^ enter 9th grade suggests a discrepancy between high school 
entry expectation and earlier preparation. Moreover, there has 
traditionally been little cross-level curric\ilum planning that would 
lead to articulation between these school levels. Instructional 
coordjlnators and directors might we4.1 consider the possibility of this 
hypothesis. 

AISD'riS population may differ from the natipnal population and hence from 
the Aorm group population of the STEP or the" CAT. 

On the basis of vast national evidence that the compositiotr of the 
school population on non-school factors will itself have an effect on 
achievement, it is to be expected that a school population make-up 
discrepant ffom the national make-up will affect percentile standings. 
Austin's school population differs in composition from the national pop- 
ulation on a number of counts. To the degree the test norms might 
biased toward national or 'AISD tftake-up, we might expect greater or less 
conformity oh scores on the two tests. 

Test familiarity may play a role in score differences. 

AISD has been usinfe the CAT for four yqars. Unconsciously even, 
personnel in AISD may have "mtemalized the test content and have 

' y ' 



/' ■ ' - • f 

tailored ins'truction ^ward the currltulum' content of the test. 
^Aiso, students in grade* 8 have taken the same test three years 
In a row. ' They too may be unconsciously learning toward the 
tests* This suggests percentiles at grade 6 '||h'0uld be closer to 
STEP percentiles than those at grad^ 8 ad» indeed, is the case. 

8. The STEP is a more difficult test than the CAT. 

i • • . ■ ' . . . • • 



V 



This is comparable to saying that items Land 2 above are true. . 



It does appeat to ORE that the STEP is a' more challenging test. | 



9. The time of year for the administration may have depressed STEP 
scores . / - 

This* also may well be true. . Thfe only time ih whicjhi the STEP 
. could be scheduled iri the 1975-76 school calendar was tjie week before 
and after Easter and no make-^ups could be scheduled^ Thi-s time could' 
have affected both attendance at the test and student attitude tpward 
the test. The CAT in grades 1-6 was given in April two weeks prior 
' tc Easter and in grades 7-8 in February and make-ups Vere given. ! 
• • - V. . ' 

10. Deviation of administration procedures from those used in the 

notmlng study. ^ 

^ ; \ I ' ^ 

The CAT. consists of only four subtests <2 reading, si-math) while 
eight STEP subtests wefe^ given. In the STEP norming no jnore thati.^ 
' 2 (Subtests were given per day; tn the AISD administratiojn, again ; 
Ifor sqheduling reasons, ^C^l 8 subtests' Were given in 2 days. -The| 
CAT w/s given over a 2 Way period such tha^t only the two subtestsl 
were 5iven each day. Thus, fatigue may. have acted to depress STEP 
scores. If 9th graders were assumed to be more easily subject to : 
fatigue than senibrs, the 9th to 12th grade upward movement of tile 
scores would also tend to support this posaiblllty. I | 
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The Content of* Achievement Tests 

/ V ' " •• • it 

[y . in use In. the " 

Austin Indepclfdent Schdol District 



There hfis been some thought given to the possible need' for a 

• -■ • ' i ' ' ' ' V *' ■ . 

"minimal skills proficiencyM test for jthe Austin Independent School 

' ' .■ . . .V ' ' ' ' 

District (AISD). This led the Office, qf Research a^d Evaluation to 
prepWe a comparison of tTie two current achievement t;ests used by AISD 
to one frequently used ''minimal skills proficiency" test »J the CTB/HcGraw 
Hill Proficiency ^and Review T^^^ts, for Reading and t^umerlcal P^pflclency 

, (popularly ^nown as the Denver tests because they began as, attest 
series designed. fbr use iiy the Denver Public Schogis) . This comparison 
is of interest for two reasons. First, it, appears that Level 4 of the ( 
California Achievement --Tejst (CAT) now being used iti A'^D sixth to elghtli 
grades is an adequate- measure of the same thlpg tested by the proficiency 
test, particularly in maithematics. Moreover, it is evident that there is 
a great difference between the CAT and the Sequential Test of Educattional 
Progress (STEP) with the STEP measuring skills and content at whdT'^ci'dald 

.commonly be accepted as a much higher level than that of the CAT. 
The two tables following contain the comparison first to the 
Proficiency and Review Test for Numerical Proficiency and second among ^ 
the three reading tests- ^ 



Mathematics 



■ 7 

Ptoficd 



Leal 
bof ^dency & 
tlUiHew.Tcst 



3 

. 4. . 
6 

* y. 

8 

9 
10 
11 
12 
13 
lA 
15 

16 
■ 17 
- 18 
19 
20 



21 
22 
23 
24 
25 
26 
27 

2a 

29 
3 
31 
32 
33 
34 
.35 
36 
37 
38 
39 
40 
41 



Item Peacrlptlon 




Add 
Add 
Add 



L — 7 

2, 3-place numbers 
,4, 3-place numbers 



CAT* . STEP 
Cpmparabl^ . Comparable 
Iteni A Item , 



Add fractions requiring a cbtilverslon (formula) 
Add ^^oney Including hundreds-of-dollars 

ana cents 
Add 
"Add 
Add 
Add 
Add 

Subtract 
Subtract 
Subtract 



Sub tracts.' 
Subtracts 

+ cents 
Subtract 
Subtract 
Subtract 



decimals (formula) 
ft. and inches 
mixed fractions ^ 
decimals (2 place ' plus 3 place) 
hrs. and mins. (3 sets) 
3-place numbers 

3- place numbers 
money (formula)^ 

4- pXace numbers 

dollars + cents from tens of dollars 



2% 3-place numbera + mixed fractions 
1-place decimal from 3-place decimal 
mixed decimal ^ whole numbers 
Sub tract, hours + tains, froirf hours otily 
Subtract ft. + inches from ft. + in. (carrying 

necessary) ^ 
Take dollars written. out and show as^ figures f 
Translate Z to decimal 
translate % to fraction ^ 
Translate written numbers to figures 
Translate fraction to decimal 
Translate written fraction to decimal figure 
Take percentage of money ' 
Take percentage of money ■ \ 

Find largest fraction 
Find 'largest decimal , 
Multiply 3-place number by 1-place number 
Multiply 3-place number by 2-place number^.-^ 
Multiply decimals* v 
Multiply whole number by fraction 
Multiply fraction by fraction 
Multiply mensuration (ft. xift., etc.) 
Multiply 3 or A-place number by^ 3-place 
MultipJW mensuration (ft.xft. , etc.) 
Mul t ipi^r" dec imals 
Multiply mixed fractions 
Divide biy 1-place number 



i • • 

. 2 
3 

9 & 10 
1 

19 & 20 

18 
11 & 12 
19- i 20/ 

6& 5 

6 & 5 
14 

^ & 8 

f 1^ 



23 

15,16,21\22 
24 V 



f 



10 ' 

5. > 
7 & 14 

11 
1 



Z6 

3 ^ l2 



12 



13 & 17 
8 

11 & 15 
11 & 15 

.3- 
22 : 
25 

26 & 27. 
■ 42 

33 
34 & 35 

41 
'.28 

43 
36 
31 & 29 
30- 



32 
34 



34 & 36 



38 
55 



A 



21 





Mathematics Continued 



H'tnnerlcal 

?fbf Iclency & , CkT^ , : STEP 



view teat, , * Comparable Comparable 

tem T ^ Item Description ■ ^ ;'••> ^ Item Item ^ 

42 . Divide by 3-p:|.ace number ^ ' . * * ' ^ 30' v 

' 43 >^ Divide by di|Clmal V \ ^ 39 16 v . , 

, '44 Divide ftC^.or gallons ' * * ic - ^ t 

45 Divide decimal ' ' , ^ ; ' \ - 22 



46 Divide decimal • , ' • ^ 

.47 Divide ft. J or gallons , - - 

48 Divide by 2-p lace 'number , / , , , 38 . Hjr 

49 Divide by fraction ^ • V - , ' 47 ^; V ^9 

50 Divide fraction by fraction ^ ' 45, 46, 48, 49 " ^ 



*CAT subtests are independently numbei; ^d thu^ duplicate, ^ni^bers here do not necessarily 
- Indicate duplicate Items . . * • 

■ ' ' ' ' ■ v.- . ■ -w ' 
: ^ — " • — — • — ■ — ^7" — ^ ' 7^ 

CAT and STEP Items beyond the ljumerlcal ' : 

Proficiency & Review Test Items . .. ^ 

■ — ■■ . • ' \ ' : ^ — ^ ■ ■ -• — ^ ■ ^ 



STEP / 



Computation: Items 6, 9, 13, 15-17^ 19-20. 23-25, 27-^0, 31-33, 35, 39-54, 
56-60*. • . - . ^ ' • 



7 



Examples (Patterned after test; items, but not actual ' 
test items) : ' . ; 

\ (1/3 + 1/3) . > ? ] \J 

~ (1/4 + 1/4) t (1/4 + 1/4) , ' — . 



/ 



The averagej of 6, 9, 7, 0, 4 and; 1 is ? 
^ If r - 69 and d - 24 , then rd » ? . 

Basic Concepts: Items 1-50 ■ - k . 

Examples (Patterned after test Items, but not actual ■ 
^ ' ^t^t items) : ^ ^ * 

. . *If the a,rea of a triangle is 64, then the area of . * 

the parallelogram (with a picture of a parallelpgrajn 
. V " in wh^ch the triangle is embedded) Is ? 

\ ■' } (692)^ - ? ' ' ..: :v;>-V" 

, ' If'n-7 >21» then nocSn ^e ? 

/ ' ■ + d - (c+d)(c+d) -i ? ' ^ 



CAT and STEP Items ^beyond the Numerical 
Proficiency & Reviiew Test Items 



Continued 



CAT 



Computation! 



Concepts: 



Items A, 17,' 44. ' ' ^ / ^ 

Example (Patterned a^fter test iteyn, but not actual item) 

3 x (-4) ? 
Items 1, 2;4v 6, 7, 9-11, 13-35. 



Exiamples (Patterned after test items, but not actual items) 



means the same as ? 



Problems : 



2,000 ; 

means ? 
Items 1--8, 10, 14 

Examples (Patterned after test items, but not actual items) : 

One box weighs 10 pounds, another 12, and a third 17 

pounds. What is their average weight? 
John bought a refrigerator for $60Q. He paid $100 

down and will pay the rest in 10 equal payments. 

How much will each payment be? 
A triang^le base is 10 inches; its height is 6 inches'! 

Wk^ is its area? 
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Reading 



CAT 



Vocabulary 

Reading Proficiency 
6f Review 



STEP 



40 items 

t 



26 items 
r 



^tems of comparable nature, some 
more difficult, on CAT. All are 
multiple choice. 



30 items 



Identification of 
wor^s, phrase, and sentencfe^ 
in context, difficulty m^y 
by higher. 



CAT 



Comprehension 

Reading Proficiency 
& Review 



4 read;|.ng selections n 
(3 selections could be 
* science or social studies 
1 could be labeled as math) 
10 items/selection 
+ table of contents 
+ index* 
+ diagrams 



3 reading selections 
(All could be labeled 
scietine (health) or 
social studies) 
8 items/selection 



STEP 



5 reading selectiotis 
(1 selection science, : 
3 literary* including drama 
.dialogue) 
■> 5 to 8 items/selection 
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